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Detailed Description Text - DETX (3) : 

The language input system employs a statistical language model to 
achieve 

very high accuracy. In one exemplary implementation, the language 
input 

architecture uses statistical language modeling with automatic, 
maximum-likelihood-based methods to segment words, select a lexicon, 
filter 

training data, and derive a best possible conversion candidate. 
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Brief Summary Text - BSTX (4) : 

Exponential models (D. Beeferman, A. Berger and J. Lafferty, "Text 
segmentation using exponential models" in Proc. Empirical Methods in 
Natural 

Language Processing 2 (AAAI ) , 1997, Providence, R.I.) are built by 
combining 

weighted binary features. The features are binary because they 
provide a 1.0 

score if they are present or a 0 . 0 score if not present. A learning 
procedure 

(typically a greedy search) finds how to weight each of these 
features to 

minimize the cross entropy between segmented training data and the 

exponential 

model. These features are typically cue-word features. Cue-word 
features 

detect the presence or absence of specific words that tend to be used 
near the 

segment boundaries. For example, in many broadcast programs, words 
or 

sentences like "and now the weather" or "reporting from" tend to 
indicate a 

transition to a next topic. 
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Brief Summary Text - BSTX (27): 

The distance between a speech segment and a training token may 4 be 

determined 

by finding the optimal time alignment of the two using dynamic 
programming 

techniques. Then, given the optimal alignment, the squared Euclidean 
distances 

between aligned frames may be summed to obtain an overall distance 
between the 

speech segment and the training token . Penalties may be added to the 
raw 

distances to account for differing numbers of frames in the speech 
segment and 

the training token . A score then is generated based on the distances 
between 

the speech segment and the training tokens . The score is a measure 
of the 

match between the speech segment and the training data represented by 
the 

training tokens , and may be determined as a function of the distance 
of the 

speech segment from the k nearest training tokens , where k may equal 
one . 
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